ARMA/ARIMA/SARIMA Models

Author

Xiaojing Ni

Recap

In the EDA session, we explore the ACF and PACF for detrended daily stream flow, however, the Augmented Dickey-Fuller Test show that after decomposition, the randomness is still non-stationary.

Log-transform

Code
autoplot(discharge_ts, main="Stream flow discharge at Waterville, OH station")
autoplot(log(discharge_ts), main="Stream flow discharge at Waterville, OH station")

(a) Original stream flow

(b) Log-transformed stream flow

Figure 1: Log transform of the data

Figure 1 shows that the data need to be log-transformed. Thus, compared to the analysis in the EDA session, log-transformed data is used here.

ACF and PACF

Code
### can get monthly data
# Get mean value for each month
discharge_df["log_discharge"] <- log(discharge_df["discharge.discharge"])

mean_data <- discharge_df %>% 
  mutate(month = month(discharge.date), year = year(discharge.date)) %>% 
  group_by(year, month) %>% 
  summarize(mean_value = mean(log_discharge))

discharge_month<-ts(mean_data$mean_value,star=decimal_date(as.Date("1970-01-01",format = "%Y-%m-%d")),frequency = 12)

discharge_month2022 <- window(discharge_month,start=1980,end=2022)
decompose_discharge <- decompose(as.ts(discharge_month2022), "additive")
plot(decompose_discharge)

Figure 2: Decomposition of monthly stream flow

Code
## ACF

ggAcf(discharge_month,400)
ggPacf(discharge_month,400)

(a) ACF plot for monthly streamflow 1970-2022

(b) PACF plot for monthly streamflow 1970-2022

Figure 3: ACF and PACF plots

Code
################ ADF Test #############
tseries::adf.test(discharge_month2022)

    Augmented Dickey-Fuller Test

data:  discharge_month2022
Dickey-Fuller = -9.366, Lag order = 7, p-value = 0.01
alternative hypothesis: stationary

The monthly loged data is still not stationary.

Difference

Code
######### see the differenced series #########
# first order
df.ts.discharge<- discharge_month %>% diff() 
df.ts.discharge%>% ggtsdisplay()
Warning: Removed 3 rows containing missing values (`geom_point()`).

Figure 4: First order differencing of monthly stream flow

Code
## second order
df.ts.discharge %>% diff() %>% diff() %>% ggtsdisplay()
Warning: Removed 3 rows containing missing values (`geom_point()`).

Figure 5: Second order differencing of monthly stream flow

Code
## second order
df.ts.discharge %>% diff() %>% diff() %>% diff() %>% ggtsdisplay()
Warning: Removed 3 rows containing missing values (`geom_point()`).

Figure 6: Third order differencing of monthly stream flow

Code
################ ADF Test #############
df.ts.discharge1980 <- window(discharge_month,start=1980,end=2021)
tseries::adf.test(df.ts.discharge1980)

    Augmented Dickey-Fuller Test

data:  df.ts.discharge1980
Dickey-Fuller = -9.13, Lag order = 7, p-value = 0.01
alternative hypothesis: stationary

Figure 5 shows that the second order differencing may be needed for this data. Figure 4 is not enough, while Figure 6 does not improve much compared to the second order differencing. However, the ADF test show that the time series still not stationary. There might be some seasonal pattern need to be considered. We will go with the second order for now.

For the MA model parameter, q, ACF plot shows that it can be 1 or 2. And the AR model parameter, p, from PACF plot shows that it can be up to 7.

ARIMA model

Code
i=1
d=1
temp= data.frame()
ls=matrix(rep(NA,6*40),nrow=40)


for (p in 1:8)#2 p=0,1,2,3,4,5,6,7
{
  for(q in 1:3)#2 q = 0,1,2
  {
    for(d in 1:2)#2 d=1,2
    {
      
      if(p-1+d+q-1<=8)
      {
        skip_to_next <- FALSE
            model<- tryCatch({
    Arima(df.ts.discharge1980,order=c(p-1,d,q-1),include.drift=FALSE) 
        
  }, error = function(e) { skip_to_next <<- TRUE})
if(skip_to_next) { next }
            ls[i,]= c(p-1,d,q-1,model$aic,model$bic,model$aicc)
            i=i+1
            #print(i)
            
        
      }
      
    }
  }
}




temp= as.data.frame(ls)
names(temp)= c("p","d","q","AIC","BIC","AICc")

#temp
knitr::kable(temp)
p d q AIC BIC AICc
0 1 0 1357.648 1361.846 1357.656
0 2 0 1769.527 1773.724 1769.535
0 1 1 1347.007 1355.404 1347.031
0 2 1 1364.089 1372.482 1364.114
0 1 2 1348.832 1361.427 1348.881
0 2 2 1353.817 1366.406 1353.866
1 1 0 1346.593 1354.989 1346.617
1 2 0 1564.562 1572.955 1564.586
1 1 1 1348.447 1361.042 1348.496
1 2 1 1353.359 1365.948 1353.408
1 1 2 1349.204 1365.997 1349.286
1 2 2 1355.176 1371.961 1355.258
2 1 0 1348.500 1361.095 1348.549
2 2 0 1508.410 1521.000 1508.460
2 1 1 1349.326 1366.120 1349.409
2 2 1 1355.237 1372.022 1355.319
2 1 2 1263.736 1284.728 1263.859
2 2 2 1356.069 1377.051 1356.192
3 1 0 1349.660 1366.453 1349.742
3 2 0 1489.302 1506.087 1489.384
3 1 1 1241.035 1262.028 1241.159
3 2 1 1356.477 1377.459 1356.600
3 1 2 1209.143 1234.334 1209.316
3 2 2 1358.036 1383.215 1358.210
4 1 0 1346.226 1367.218 1346.349
4 2 0 1484.882 1505.864 1485.005
4 1 1 1219.709 1244.900 1219.883
4 2 1 1353.250 1378.429 1353.424
4 1 2 1204.395 1233.784 1204.626
4 2 2 1236.064 1265.439 1236.296
5 1 0 1329.687 1354.878 1329.860
5 2 0 1479.310 1504.489 1479.484
5 1 1 1203.909 1233.299 1204.141
5 2 1 1337.097 1366.472 1337.329
5 1 2 1201.744 1235.332 1202.042
6 1 0 1309.947 1339.336 1310.178
6 2 0 1477.736 1507.111 1477.968
6 1 1 1200.849 1234.437 1201.147
7 1 0 1266.364 1299.951 1266.662
NA NA NA NA NA NA
Code
temp[which.min(temp$AIC),]
   p d q      AIC      BIC     AICc
38 6 1 1 1200.849 1234.437 1201.147
Code
temp[which.min(temp$AICc),]
   p d q      AIC      BIC     AICc
38 6 1 1 1200.849 1234.437 1201.147
Code
temp[which.min(temp$BIC),]
   p d q      AIC      BIC     AICc
33 5 1 1 1203.909 1233.299 1204.141
Code
######### compare 2 models

model_output <- capture.output(sarima(df.ts.discharge1980,6,1,1))
cat(model_output[53:81], model_output[length(model_output)], sep = "\n") 
Coefficients:
         ar1     ar2      ar3      ar4      ar5      ar6      ma1  constant
      0.5203  0.1032  -0.0665  -0.0945  -0.1351  -0.1040  -1.0000     3e-04
s.e.  0.0448  0.0504   0.0505   0.0505   0.0505   0.0451   0.0067     3e-04

sigma^2 estimated as 0.6407:  log likelihood = -591.99,  aic = 1201.99

$degrees_of_freedom
[1] 484

$ttable
         Estimate     SE   t.value p.value
ar1        0.5203 0.0448   11.6064  0.0000
ar2        0.1032 0.0504    2.0486  0.0410
ar3       -0.0665 0.0505   -1.3159  0.1888
ar4       -0.0945 0.0505   -1.8719  0.0618
ar5       -0.1351 0.0505   -2.6766  0.0077
ar6       -0.1040 0.0451   -2.3046  0.0216
ma1       -1.0000 0.0067 -148.3871  0.0000
constant   0.0003 0.0003    0.9217  0.3572

$AIC
[1] 2.443064

$AICc
[1] 2.44367

$BIC
[1] 2.519866

Figure 7: Model 1 diagnosis (6,1,1)

Code
model_output2 <- capture.output(sarima(df.ts.discharge1980,5,1,1))
cat(model_output2[47:74], model_output2[length(model_output2)], sep = "\n")
Coefficients:
         ar1     ar2      ar3      ar4      ar5      ma1  constant
      0.5407  0.1139  -0.0596  -0.1062  -0.1910  -1.0000     3e-04
s.e.  0.0442  0.0504   0.0507   0.0505   0.0445   0.0063     4e-04

sigma^2 estimated as 0.648:  log likelihood = -594.63,  aic = 1205.27

$degrees_of_freedom
[1] 485

$ttable
         Estimate     SE   t.value p.value
ar1        0.5407 0.0442   12.2310  0.0000
ar2        0.1139 0.0504    2.2583  0.0244
ar3       -0.0596 0.0507   -1.1756  0.2403
ar4       -0.1062 0.0505   -2.1033  0.0360
ar5       -0.1910 0.0445   -4.2929  0.0000
ma1       -1.0000 0.0063 -159.2364  0.0000
constant   0.0003 0.0004    0.7963  0.4263

$AIC
[1] 2.44973

$AICc
[1] 2.4502

$BIC
[1] 2.517998

Figure 8: Model 2 diagnosis (5,1,1)

From the results and diagnosis (Figure 7 and Figure 8), ARIMA(5,1,1) and ARIMA(6,1,1) are with lowest AIC, AICc, or BIC. From the diagnosis, both model works well. ARIMA(5,1,1) has less parameters, thus, ARIMA(5,1,1) is chosen as the best model with the equation as follow: \[ x_t = 0.5407*x_{t-1}+0.1139*x_{t-2}-0.0596*x_{t-3}-0.1062*x_{t-4}-0.1910*x_{t-5}-\omega_{t-1}+\omega_t \]

auto.arima()

Using auto.arima() gives results as below. The model is different from the chosen model.The difference comes from that auto.arima() uses different cariteria to chose the best model, i.e.a combination of unit root tests, minimization of the AIC and MLE (source).

Code
auto.arima(df.ts.discharge1980,seasonal = FALSE)
Series: df.ts.discharge1980 
ARIMA(2,0,2) with non-zero mean 

Coefficients:
         ar1      ar2      ma1     ma2    mean
      1.6474  -0.9023  -1.2396  0.5716  7.8237
s.e.  0.0318   0.0325   0.0754  0.0682  0.0466

sigma^2 = 0.6339:  log likelihood = -585.33
AIC=1182.65   AICc=1182.83   BIC=1207.86
Code
model_output3 <- capture.output(sarima(df.ts.discharge1980,2,0,2))
cat(model_output3[60:85], model_output3[length(model_output3)], sep = "\n")
Coefficients:
         ar1      ar2      ma1     ma2   xmean
      1.6474  -0.9023  -1.2396  0.5716  7.8237
s.e.  0.0318   0.0325   0.0754  0.0682  0.0466

sigma^2 estimated as 0.6274:  log likelihood = -585.33,  aic = 1182.65

$degrees_of_freedom
[1] 488

$ttable
      Estimate     SE  t.value p.value
ar1     1.6474 0.0318  51.8379       0
ar2    -0.9023 0.0325 -27.7802       0
ma1    -1.2396 0.0754 -16.4371       0
ma2     0.5716 0.0682   8.3841       0
xmean   7.8237 0.0466 168.0616       0

$AIC
[1] 2.398894

$AICc
[1] 2.399144

$BIC
[1] 2.450016

Figure 9: Model 2 diagnosis (2,0,2)

Chosen model RMSE

The chosen model is ARIMA(5,1,1).

Code
model <- sarima(df.ts.discharge1980,5,1,1)
initial  value -0.043422 
iter   2 value -0.077676
iter   3 value -0.086688
iter   4 value -0.090021
iter   5 value -0.095887
iter   6 value -0.154643
iter   7 value -0.181523
iter   8 value -0.195392
iter   9 value -0.196261
iter  10 value -0.200343
iter  11 value -0.200970
iter  12 value -0.203454
iter  13 value -0.203627
iter  14 value -0.204076
iter  15 value -0.204154
iter  16 value -0.204176
iter  17 value -0.204177
iter  18 value -0.204178
iter  19 value -0.204178
iter  19 value -0.204178
iter  19 value -0.204178
final  value -0.204178 
converged
initial  value -0.204173 
iter   2 value -0.204677
iter   3 value -0.207122
iter   4 value -0.209444
iter   5 value -0.209952
iter   6 value -0.210167
iter   7 value -0.210276
iter   8 value -0.210329
iter   9 value -0.210333
iter  10 value -0.210334
iter  11 value -0.210334
iter  12 value -0.210334
iter  12 value -0.210334
iter  12 value -0.210334
final  value -0.210334 
converged

Code
(rmse_arima511 <- sqrt(mean(model$fit$sigma2)))
[1] 0.8049785

Forecast

Here, using the chosen model ARIMA(5,1,1) to forecast the next 10 months. Figure 10 shows the forecasting plot.

Code
###### Forecasting ###########
df.ts.discharge1980 %>%
  Arima(order=c(5,1,1),include.drift = FALSE) %>%
  forecast %>%
  autoplot() +
  ylab("Log discharge") + xlab("Time") +
  theme(aspect.ratio = 0.4)

Figure 10: Next 10 month forecasting

Comparing with benchmark models

In this session, we compared ARIMA model with benchmark models of mean, naive and seasonal naive methods. From the diagnosis in Figure 11, we can see the seasonal naive method is better than the other two. From Summary of ARIMA model and the Seasonal naive model, we can see the chosen ARIMA(5,1,1) has lower RMSE and MAE. However, the seasonal naive method captures better seasonal trend in Figure 12. And the diagnosis plot (Figure 11 (c)) also better than the chosen model in Figure 8.

Code
####################### Benchmark methods ###############################
f1<-meanf(df.ts.discharge1980, h=10) #mean
checkresiduals(f1)#serial correlation ; Lung Box p <0.05
f2<-naive(df.ts.discharge1980, h=10) # naive method
checkresiduals(f2)#serial correlation ; Lung Box p <0.05
f3<-snaive(df.ts.discharge1980, h=10) #seasonal naive method
checkresiduals(f3) #serial correlation ; Lung Box p <0.05

    Ljung-Box test

data:  Residuals from Mean
Q* = 1216.2, df = 24, p-value < 2.2e-16

Model df: 0.   Total lags used: 24

    Ljung-Box test

data:  Residuals from Naive method
Q* = 157.81, df = 24, p-value < 2.2e-16

Model df: 0.   Total lags used: 24

    Ljung-Box test

data:  Residuals from Seasonal naive method
Q* = 358.05, df = 24, p-value < 2.2e-16

Model df: 0.   Total lags used: 24

(a) Mean method

(b) Naive method

(c) Seasonal naive method

Figure 11: Benchmark models diagnosis

Code
autoplot(df.ts.discharge1980) +
  autolayer(meanf(df.ts.discharge1980, h=10),
            series="Mean", PI=FALSE) +
  autolayer(naive(df.ts.discharge1980, h=10),
            series="Naïve", PI=FALSE) +
  autolayer(snaive(df.ts.discharge1980, h=10),
            series="Seasonal naïve", PI=FALSE) +
  ggtitle("Forecasts for monthly stream flow") +
  xlab("Year") + ylab("Log Discharge") +
  guides(colour=guide_legend(title="Forecast")) + 
  autolayer(for_sarima$pred,series="ARIMA(5,1,1)") +
  theme(aspect.ratio = 0.4)

#seasonal Naive is good

Figure 12: Benchmark model forecasting

Summary of ARIMA model

Code
summary(fit)
Series: df.ts.discharge1980 
ARIMA(5,1,1) 

Coefficients:
         ar1     ar2      ar3      ar4      ar5      ma1
      0.5418  0.1138  -0.0597  -0.1059  -0.1895  -1.0000
s.e.  0.0442  0.0505   0.0507   0.0505   0.0445   0.0092

sigma^2 = 0.6569:  log likelihood = -594.95
AIC=1203.91   AICc=1204.14   BIC=1233.3

Training set error measures:
                      ME      RMSE       MAE       MPE     MAPE      MASE
Training set 0.004363739 0.8046979 0.6385257 -1.062096 8.377412 0.6903305
                    ACF1
Training set -0.01913352

Summary of Seasonal naive model

Code
summary(f3)

Forecast method: Seasonal naive method

Model Information:
Call: snaive(y = df.ts.discharge1980, h = 10) 

Residual sd: 1.1724 

Error measures:
                       ME     RMSE       MAE       MPE     MAPE MASE      ACF1
Training set -0.006577768 1.172446 0.9249566 -1.322599 12.26289    1 0.4562968

Forecasts:
         Point Forecast    Lo 80     Hi 80    Lo 95     Hi 95
Feb 2021       9.075799 7.573249 10.578349 6.777847 11.373752
Mar 2021       9.419143 7.916592 10.921693 7.121190 11.717095
Apr 2021       8.459100 6.956550  9.961651 6.161148 10.757053
May 2021       8.599828 7.097278 10.102379 6.301876 10.897781
Jun 2021       7.310991 5.808440  8.813541 5.013038  9.608943
Jul 2021       6.594944 5.092394  8.097495 4.296992  8.892897
Aug 2021       6.224302 4.721751  7.726852 3.926349  8.522254
Sep 2021       6.243943 4.741393  7.746494 3.945991  8.541896
Oct 2021       6.453248 4.950697  7.955798 4.155295  8.751200
Nov 2021       7.269523 5.766973  8.772074 4.971571  9.567476

This is an indication of seasonal effect need to be considered in the model.

SARIMA model

In the last session, ARIMA model was not out performed the benchmark models. This is an indication of that seasonal effect need to be considered in the model. Thus, in this session, SARIMA model is applied to model stream flow time series. From Figure 5, the p1=1, p2=8, q1=1,q2= 3, d1=1,d2=2,Q=1,2, P=1,2.

Code
######################## Check for different combinations ########


#write a funtion
SARIMA.c=function(p1,p2,q1,q2,d1,d2,P1,P2,Q1,Q2,data){
  
  #K=(p2+1)*(q2+1)*(d2)*(P2+1)*(Q2+1)
  
  temp=c()
  D=1
  s=12
  
  i=1
  temp= data.frame()
  ls=matrix(rep(NA,9*250),nrow=250)
  
  
  for (p in p1:p2)
  {
    for(q in q1:q2)
    {
      for(d in d1:d2){
        
        for(P in P1:P2)
        {
          for(Q in Q1:Q2)
          {
            if(p+d+q+P+Q+D<=12)
            {
              skip_to_next <- FALSE
              model<- tryCatch({
      Arima(data,order=c(p-1,d,q-1),seasonal=c(P-1,D,Q-1))
    }, error = function(e) { skip_to_next <<- TRUE})
  if(skip_to_next) { next }
              ls[i,]= c(p-1,d,q-1,P-1,D,Q-1,model$aic,model$bic,model$aicc)
              i=i+1
              #print(i)
              }
            
          }
          
        }
      }
    }
    
  }
  
  
  temp= as.data.frame(ls)
  names(temp)= c("p","d","q","P","D","Q","AIC","BIC","AICc")
  
  temp
  
}
Code
# q=0,1,2,3,4,5,6,7; Q=1,2 and PACF plot: p=0,1,2; P=1,2, D=1 and d=0,1,2
output=SARIMA.c(p1=1,p2=3,q1=1,q2=8,d1=1,d2=3,P1=1,P2=3,Q1=1,Q2=3,data=df.ts.discharge1980)
#output

knitr::kable(output)
p d q P D Q AIC BIC AICc
0 1 0 0 1 0 1554.711 1558.885 1554.720
0 1 0 0 1 1 1270.686 1279.033 1270.711
0 1 0 0 1 2 1272.633 1285.154 1272.683
0 1 0 1 1 0 1430.233 1438.580 1430.258
0 1 0 1 1 1 1272.634 1285.156 1272.685
0 1 0 1 1 2 1273.807 1290.502 1273.891
0 1 0 2 1 0 1367.976 1380.497 1368.026
0 1 0 2 1 1 1274.519 1291.214 1274.603
0 1 0 2 1 2 1276.416 1297.285 1276.542
0 2 0 0 1 0 2030.439 2034.611 2030.448
0 2 0 0 1 1 1759.319 1767.662 1759.344
0 2 0 0 1 2 1761.006 1773.522 1761.057
0 2 0 1 1 0 1917.611 1925.954 1917.636
0 2 0 1 1 1 1761.031 1773.546 1761.082
0 2 0 1 1 2 1761.734 1778.421 1761.819
0 2 0 2 1 0 1841.816 1854.331 1841.867
0 2 0 2 1 1 1762.431 1779.118 1762.515
0 2 0 2 1 2 1763.684 1784.543 1763.811
0 3 0 0 1 0 2579.920 2584.089 2579.928
0 3 0 0 1 1 2316.845 2325.184 2316.870
0 3 0 0 1 2 2318.307 2330.816 2318.357
0 3 0 1 1 0 2473.464 2481.803 2473.489
0 3 0 1 1 1 2318.364 2330.873 2318.415
0 3 0 1 1 2 2318.281 2334.960 2318.366
0 3 0 2 1 0 2388.689 2401.198 2388.739
0 3 0 2 1 1 2319.315 2335.994 2319.400
0 3 0 2 1 2 2319.809 2340.657 2319.936
0 1 1 0 1 0 1466.645 1474.993 1466.670
0 1 1 0 1 1 1167.104 1179.625 1167.155
0 1 1 0 1 2 1168.861 1185.556 1168.945
0 1 1 1 1 0 1327.195 1339.716 1327.246
0 1 1 1 1 1 1168.861 1185.556 1168.945
0 1 1 1 1 2 1170.867 1191.736 1170.993
0 1 1 2 1 0 1275.872 1292.567 1275.956
0 1 1 2 1 1 1170.860 1191.729 1170.987
0 1 1 2 1 2 1172.861 1197.903 1173.038
0 2 1 0 1 0 1566.434 1574.777 1566.459
0 2 1 0 1 1 1287.852 1300.367 1287.903
0 2 1 0 1 2 1289.805 1306.492 1289.890
0 2 1 1 1 0 1442.993 1455.508 1443.044
0 2 1 1 1 1 1289.807 1306.494 1289.891
0 2 1 1 1 2 1290.980 1311.838 1291.106
0 2 1 2 1 0 1381.464 1398.151 1381.548
0 2 1 2 1 1 1291.683 1312.542 1291.810
0 2 1 2 1 2 1293.581 1318.611 1293.759
0 3 1 0 1 0 2031.477 2039.817 2031.503
0 3 1 0 1 1 1765.597 1778.106 1765.648
0 3 1 0 1 2 1767.309 1783.987 1767.393
0 3 1 1 1 0 1919.648 1932.157 1919.699
0 3 1 1 1 1 1767.332 1784.011 1767.417
0 3 1 1 1 2 1768.037 1788.885 1768.164
0 3 1 2 1 0 1844.661 1861.339 1844.745
0 3 1 2 1 1 1768.698 1789.546 1768.825
0 1 2 0 1 0 1456.161 1468.682 1456.211
0 1 2 0 1 1 1153.603 1170.298 1153.687
0 1 2 0 1 2 1154.982 1175.851 1155.109
0 1 2 1 1 0 1320.001 1336.696 1320.085
0 1 2 1 1 1 1155.017 1175.886 1155.143
0 1 2 1 1 2 1157.603 1182.645 1157.780
0 1 2 2 1 0 1256.621 1277.490 1256.747
0 1 2 2 1 1 1156.561 1181.604 1156.739
0 1 2 2 1 2 1157.240 1186.457 1157.478
0 2 2 0 1 0 1480.074 1492.589 1480.124
0 2 2 0 1 1 1186.284 1202.970 1186.368
0 2 2 0 1 2 1188.093 1208.952 1188.220
0 2 2 1 1 0 1341.823 1358.510 1341.907
0 2 2 1 1 1 1188.094 1208.953 1188.221
0 2 2 1 1 2 1190.109 1215.139 1190.287
0 2 2 2 1 0 1291.215 1312.074 1291.342
0 2 2 2 1 1 1190.092 1215.122 1190.270
0 3 2 0 1 0 1574.826 1587.335 1574.876
0 3 2 0 1 1 1294.651 1311.330 1294.736
0 3 2 0 1 2 1296.555 1317.404 1296.683
0 3 2 1 1 0 1452.387 1469.066 1452.472
0 3 2 1 1 1 1296.565 1317.413 1296.692
0 3 2 2 1 0 1391.470 1412.318 1391.597
0 1 3 0 1 0 1420.254 1436.949 1420.338
0 1 3 0 1 1 1127.819 1148.688 1127.946
0 1 3 0 1 2 1129.734 1154.777 1129.912
0 1 3 1 1 0 1293.812 1314.681 1293.939
0 1 3 1 1 1 1129.740 1154.783 1129.917
0 1 3 1 1 2 1130.700 1159.916 1130.937
0 1 3 2 1 0 1232.385 1257.428 1232.563
0 1 3 2 1 1 1131.197 1160.413 1131.434
0 2 3 0 1 0 1470.527 1487.214 1470.611
0 2 3 0 1 1 1174.511 1195.369 1174.638
0 2 3 0 1 2 1176.046 1201.077 1176.224
0 2 3 1 1 0 1335.462 1356.321 1335.589
0 2 3 1 1 1 1176.075 1201.105 1176.253
0 2 3 2 1 0 1273.689 1298.720 1273.867
0 3 3 0 1 0 1493.798 1510.477 1493.883
0 3 3 0 1 1 1199.306 1220.154 1199.433
0 3 3 1 1 0 1356.308 1377.156 1356.435
0 1 4 0 1 0 1413.964 1434.833 1414.090
0 1 4 0 1 1 1122.773 1147.816 1122.950
0 1 4 0 1 2 1124.706 1153.923 1124.943
0 1 4 1 1 0 1287.435 1312.478 1287.612
0 1 4 1 1 1 1124.710 1153.927 1124.947
0 1 4 2 1 0 1227.854 1257.071 1228.091
0 2 4 0 1 0 1440.639 1461.497 1440.766
0 2 4 0 1 1 1152.585 1177.615 1152.763
0 2 4 1 1 0 1315.154 1340.184 1315.332
0 3 4 0 1 0 1485.402 1506.250 1485.529
0 1 5 0 1 0 1414.965 1440.007 1415.142
0 1 5 0 1 1 1121.012 1150.228 1121.249
0 1 5 1 1 0 1284.238 1313.454 1284.475
0 2 5 0 1 0 1433.850 1458.881 1434.028
0 1 6 0 1 0 1414.000 1443.217 1414.237
1 1 0 0 1 0 1494.539 1502.887 1494.565
1 1 0 0 1 1 1198.259 1210.780 1198.309
1 1 0 0 1 2 1200.169 1216.864 1200.253
1 1 0 1 1 0 1357.988 1370.509 1358.038
1 1 0 1 1 1 1200.169 1216.864 1200.253
1 1 0 1 1 2 1202.123 1222.992 1202.249
1 1 0 2 1 0 1309.527 1326.222 1309.611
1 1 0 2 1 1 1202.168 1223.037 1202.294
1 1 0 2 1 2 1204.121 1229.164 1204.299
1 2 0 0 1 0 1802.468 1810.812 1802.493
1 2 0 0 1 1 1513.828 1526.343 1513.878
1 2 0 0 1 2 1515.824 1532.511 1515.909
1 2 0 1 1 0 1674.889 1687.404 1674.939
1 2 0 1 1 1 1515.824 1532.511 1515.909
1 2 0 1 1 2 1517.828 1538.686 1517.954
1 2 0 2 1 0 1624.087 1640.774 1624.171
1 2 0 2 1 1 1517.731 1538.589 1517.857
1 2 0 2 1 2 1519.821 1544.851 1519.999
1 3 0 0 1 0 2224.197 2232.536 2224.223
1 3 0 0 1 1 1942.277 1954.786 1942.327
1 3 0 0 1 2 1944.162 1960.841 1944.247
1 3 0 1 1 0 2102.401 2114.910 2102.452
1 3 0 1 1 1 1944.166 1960.845 1944.251
1 3 0 1 1 2 1946.173 1967.021 1946.300
1 3 0 2 1 0 2045.283 2061.961 2045.367
1 3 0 2 1 1 1945.994 1966.842 1946.122
1 1 1 0 1 0 1414.035 1426.557 1414.086
1 1 1 0 1 1 1124.142 1140.837 1124.227
1 1 1 0 1 2 1126.113 1146.982 1126.239
1 1 1 1 1 0 1287.709 1304.404 1287.793
1 1 1 1 1 1 1126.114 1146.983 1126.241
1 1 1 1 1 2 1126.707 1151.750 1126.885
1 1 1 2 1 0 1224.966 1245.835 1225.093
1 1 1 2 1 1 1127.752 1152.795 1127.930
1 1 1 2 1 2 1129.615 1158.832 1129.852
1 2 1 0 1 0 1506.986 1519.501 1507.036
1 2 1 0 1 1 1216.317 1233.004 1216.402
1 2 1 0 1 2 1218.253 1239.111 1218.380
1 2 1 1 1 0 1371.560 1388.247 1371.645
1 2 1 1 1 1 1218.253 1239.111 1218.380
1 2 1 1 1 2 1220.200 1245.231 1220.378
1 2 1 2 1 0 1323.702 1344.561 1323.829
1 2 1 2 1 1 1220.253 1245.283 1220.431
1 3 1 0 1 0 1804.955 1817.464 1805.006
1 3 1 0 1 1 1521.987 1538.665 1522.071
1 3 1 0 1 2 1523.975 1544.823 1524.102
1 3 1 1 1 0 1678.454 1695.132 1678.538
1 3 1 1 1 1 1523.976 1544.824 1524.103
1 3 1 2 1 0 1628.289 1649.137 1628.416
1 1 2 0 1 0 1410.802 1427.497 1410.886
1 1 2 0 1 1 1118.743 1139.612 1118.870
1 1 2 0 1 2 1120.661 1145.703 1120.838
1 1 2 1 1 0 1280.183 1301.052 1280.310
1 1 2 1 1 1 1120.664 1145.706 1120.841
1 1 2 1 1 2 1122.737 1151.953 1122.974
1 1 2 2 1 0 1222.756 1247.799 1222.934
1 1 2 2 1 1 1122.494 1151.710 1122.731
1 2 2 0 1 0 1434.358 1451.045 1434.442
1 2 2 0 1 1 1147.844 1168.702 1147.971
1 2 2 0 1 2 1150.258 1175.288 1150.436
1 2 2 1 1 0 1309.900 1330.758 1310.027
1 2 2 1 1 1 1150.888 1175.918 1151.066
1 2 2 2 1 0 1249.768 1274.798 1249.946
1 3 2 0 1 0 1516.622 1533.301 1516.707
1 3 2 0 1 1 1225.311 1246.159 1225.438
1 3 2 1 1 0 1382.550 1403.398 1382.677
1 1 3 0 1 0 1411.820 1432.689 1411.946
1 1 3 0 1 1 1119.269 1144.311 1119.446
1 1 3 0 1 2 1121.169 1150.386 1121.407
1 1 3 1 1 0 1281.252 1306.295 1281.429
1 1 3 1 1 1 1121.173 1150.390 1121.411
1 1 3 2 1 0 1224.540 1253.757 1224.778
1 2 3 0 1 0 1467.266 1488.124 1467.392
1 2 3 0 1 1 1143.565 1168.596 1143.743
1 2 3 1 1 0 1337.775 1362.805 1337.953
1 3 3 0 1 0 1501.151 1521.999 1501.278
1 1 4 0 1 0 1404.794 1429.837 1404.972
1 1 4 0 1 1 1121.052 1150.269 1121.289
1 1 4 1 1 0 1283.086 1312.302 1283.323
1 2 4 0 1 0 1437.959 1462.990 1438.137
1 1 5 0 1 0 1415.661 1444.877 1415.898
2 1 0 0 1 0 1483.043 1495.564 1483.094
2 1 0 0 1 1 1186.255 1202.950 1186.339
2 1 0 0 1 2 1188.131 1209.000 1188.258
2 1 0 1 1 0 1344.974 1361.669 1345.058
2 1 0 1 1 1 1188.131 1208.999 1188.257
2 1 0 1 1 2 1190.107 1215.149 1190.284
2 1 0 2 1 0 1294.571 1315.440 1294.698
2 1 0 2 1 1 1190.128 1215.170 1190.305
2 1 0 2 1 2 1192.109 1221.325 1192.346
2 2 0 0 1 0 1711.439 1723.954 1711.489
2 2 0 0 1 1 1420.433 1437.119 1420.517
2 2 0 0 1 2 1422.422 1443.281 1422.549
2 2 0 1 1 0 1579.997 1596.684 1580.081
2 2 0 1 1 1 1422.422 1443.281 1422.549
2 2 0 1 1 2 1424.399 1449.430 1424.577
2 2 0 2 1 0 1532.658 1553.517 1532.785
2 2 0 2 1 1 1424.409 1449.439 1424.586
2 3 0 0 1 0 2042.722 2055.230 2042.772
2 3 0 0 1 1 1764.449 1781.127 1764.533
2 3 0 0 1 2 1766.211 1787.059 1766.338
2 3 0 1 1 0 1925.007 1941.686 1925.092
2 3 0 1 1 1 1766.222 1787.070 1766.349
2 3 0 2 1 0 1874.001 1894.849 1874.128
2 1 1 0 1 0 1410.127 1426.822 1410.211
2 1 1 0 1 1 1117.561 1138.430 1117.687
2 1 1 0 1 2 1119.461 1144.504 1119.639
2 1 1 1 1 0 1279.405 1300.274 1279.532
2 1 1 1 1 1 1119.465 1144.508 1119.643
2 1 1 1 1 2 1120.457 1149.673 1120.694
2 1 1 2 1 0 1222.582 1247.625 1222.760
2 1 1 2 1 1 1121.283 1150.500 1121.521
2 2 1 0 1 0 1495.823 1512.510 1495.908
2 2 1 0 1 1 1204.650 1225.509 1204.777
2 2 1 0 1 2 1206.558 1231.588 1206.736
2 2 1 1 1 0 1358.898 1379.757 1359.025
2 2 1 1 1 1 1206.557 1231.587 1206.735
2 2 1 2 1 0 1309.131 1334.162 1309.309
2 3 1 0 1 0 1714.820 1731.499 1714.905
2 3 1 0 1 1 1429.491 1450.339 1429.618
2 3 1 1 1 0 1584.485 1605.334 1584.613
2 1 2 0 1 0 1412.086 1432.955 1412.213
2 1 2 0 1 1 1119.529 1144.571 1119.706
2 1 2 0 1 2 1121.430 1150.647 1121.667
2 1 2 1 1 0 1281.374 1306.416 1281.551
2 1 2 1 1 1 1121.434 1150.651 1121.671
2 1 2 2 1 0 1227.730 1256.946 1227.967
2 2 2 0 1 0 1430.807 1451.666 1430.934
2 2 2 0 1 1 1141.894 1166.924 1142.072
2 2 2 1 1 0 1365.832 1390.863 1366.010
2 3 2 0 1 0 1506.043 1526.891 1506.170
2 1 3 0 1 0 1401.231 1426.274 1401.409
2 1 3 0 1 1 1120.594 1149.810 1120.831
2 1 3 1 1 0 1283.342 1312.558 1283.579
2 2 3 0 1 0 1433.213 1458.244 1433.391
2 1 4 0 1 0 1400.181 1429.397 1400.418
NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA
Code
output[which.min(output$AIC),] 
    p d q P D Q      AIC     BIC     AICc
214 2 1 1 0 1 1 1117.561 1138.43 1117.687
Code
output[which.min(output$BIC),]
    p d q P D Q      AIC     BIC     AICc
214 2 1 1 0 1 1 1117.561 1138.43 1117.687
Code
output[which.min(output$AICc),]
    p d q P D Q      AIC     BIC     AICc
214 2 1 1 0 1 1 1117.561 1138.43 1117.687

The AIC, AICc, and BIC give the same model SARIMA(2,1,1,0,1,1). SARIMA(1,1, 2,0, 1, 1) also has good performance compared to others.

Cross validation with 1 step ahead forecosts for SARIMA

In this session, a seasonal cross validation using 1 step ahead forecasts and compare the models is performed.

Code
k <- 200 # minimum data length for fitting a model, less than this number will lead to non-stationary fit
n <- length(df.ts.discharge1980)


i=1
err1 = c()
err2 = c()

for(i in 1:(n-k))
{
  xtrain <- df.ts.discharge1980[1:(k-1)+i] #observations from 1 to 350
  xtest <- df.ts.discharge1980[k+i] #351th observation as the test set
  
  fit <- Arima(xtrain, order=c(2,1,1), seasonal=c(0,1,1))
  fcast1 <- forecast(fit, h=1)
  
  fit2 <- Arima(xtrain, order=c(1,1,2), seasonal=c(0,1,1))
  fcast2 <- forecast(fit2, h=1)
  
  #capture error for each iteration
  # This is mean absolute error
  err1 = c(err1, abs(fcast1$mean-xtest)) 
  err2 = c(err2, abs(fcast2$mean-xtest))
  
  # This is mean squared error
  err3 = c(err1, (fcast1$mean-xtest)^2)
  err4 = c(err2, (fcast2$mean-xtest)^2)
  
}

(MAE1=mean(err1)) # This is mean absolute error
[1] 0.7558021
Code
(MAE2=mean(err2)) 
[1] 0.7619506
Code
RMSE1=sqrt(mean(err1^2)) #fit2,1,1,0,1,1
RMSE2=sqrt(mean(err2^2))#fit1,1,2,0,1,1

RMSE1
[1] 0.9324466
Code
RMSE2
[1] 0.9406275

SARIMA(2,1,1,0,1,1) is the best model based on cross validation, which has lower RMSE of 0.93.

Model diagnosis

Code
set.seed(123)
model_output <- capture.output(sarima(df.ts.discharge1980, 2,1,1,0,1,1,12))
cat(model_output[51:75], model_output[length(model_output)], sep = "\n") 
Coefficients:
         ar1     ar2      ma1     sma1
      0.3751  0.1335  -1.0000  -1.0000
s.e.  0.0452  0.0454   0.0111   0.0524

sigma^2 estimated as 0.5251:  log likelihood = -553.78,  aic = 1117.56

$degrees_of_freedom
[1] 476

$ttable
     Estimate     SE  t.value p.value
ar1    0.3751 0.0452   8.2902  0.0000
ar2    0.1335 0.0454   2.9416  0.0034
ma1   -1.0000 0.0111 -90.2029  0.0000
sma1  -1.0000 0.0524 -19.0976  0.0000

$AIC
[1] 2.328251

$AICc
[1] 2.328427

$BIC
[1] 2.371728

Figure 13: Final SARIMA model dignosis

The final model diagnosis (Figure 13) show that p-values of Ljung-box statistic have some pattern left. The ACF of residuals and QQ plot show the residuals are stationary.

Compare with Benchmark methods

Code
fit <- Arima(df.ts.discharge1980, order=c(2,1,1), seasonal=c(0,1,1))

autoplot(df.ts.discharge1980) +
  autolayer(meanf(df.ts.discharge1980, h=24),
            series="Mean", PI=FALSE) +
  autolayer(naive(df.ts.discharge1980, h=24),
            series="Naïve", PI=FALSE) +
  autolayer(snaive(df.ts.discharge1980, h=24),
            series="SNaïve", PI=FALSE)+
  autolayer(rwf(df.ts.discharge1980, h=24, drift=TRUE),
            series="Drift", PI=FALSE)+
  autolayer(forecast(fit,24), 
            series="fit",PI=FALSE) +
  guides(colour=guide_legend(title="Forecast"))

Figure 14: Benchmark models

From Figure 14, seasonal naive seems have better prediction compared to others. And the fitted SARIMA model has less magnitude of the prediction compared to seasonal naive. In the following, RMSE and MAE are used to compare the performance of the two model.

Summary of Seasonal naive model

Code
f2 <- snaive(df.ts.discharge1980, h=24) 

accuracy(f2)
                       ME     RMSE       MAE       MPE     MAPE MASE      ACF1
Training set -0.006577768 1.172446 0.9249566 -1.322599 12.26289    1 0.4562968

Summary of SARIMA(2,1,1,0,1,1).

Code
summary(fit)
Series: df.ts.discharge1980 
ARIMA(2,1,1)(0,1,1)[12] 

Coefficients:
         ar1     ar2      ma1     sma1
      0.3751  0.1335  -1.0000  -1.0000
s.e.  0.0452  0.0454   0.0111   0.0524

sigma^2 = 0.5295:  log likelihood = -553.78
AIC=1117.56   AICc=1117.69   BIC=1138.43

Training set error measures:
                     ME      RMSE       MAE        MPE     MAPE      MASE
Training set 0.02658673 0.7150316 0.5639316 -0.5131699 7.377803 0.6096843
                     ACF1
Training set 0.0005123345

The statistic shows that the final model SARIMA(2,1,1,0,1,1) has lower RMSE and MAE, which is a better model compared to seasonal naive benchmark model.